Structure Annotation in the Polish Corpus of Suicide Notes

نویسندگان

  • Michal Marcinczuk
  • Monika Zasko-Zielinska
  • Maciej Piasecki
چکیده

Polish Corpus of Suicide Notes (henceforth PCSN) is constructed to meet the needs of forensic linguistics. Suicide notes are messages created in borderline situation, shortly before death. Hence the annotation schema requires a complex description of a document structure, the textual content, as well as its linguistic properties. TEI was selected as the basis for the document encoding schema. TEI adaptation and extension with respect to such aspects of encoding as: a letter structure, various layers of changes and omissions, error correction, and extra-linguistic elements etc., are discussed with examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Genuine Polish Suicide Notes

In this article we present the result of the research on the recognition of genuine Polish suicide notes (SNs). We provide useful method to distinguish between SNs and other types of discourse, including counterfeited SNs. The method uses a wide range of word-based and semantic features and it was evaluated using Polish Corpus of Suicide Notes, which contains 1244 genuine SNs, expanded with a m...

متن کامل

Lexicons and Grammars for Named Entity Annotation in the National Corpus of Polish

We present initial results in the named entity annotation subtask of a project aiming at creating the National Corpus of Polish. We summarize the annotation requirements de ned for this corpus, and we discuss how existing lexical resources and grammars for Polish named entities have been adapted to meet those requirements. We show rst results of the corpus annotation using the information extra...

متن کامل

The Design of Syntactic Annotation Levels in the National Corpus of Polish

This paper presents the procedure of the syntactic annotation of the National Corpus of Polish. Syntactic annotation consists here of shallow parsing and manual post-editing of the results by annotators. The description concentrates on the delimitation of syntactic words and groups, as well as on problems encountered during the annotation process.

متن کامل

The design of Polish Speech Corpus for Unit Selection Speech Synthesis

The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Sectio...

متن کامل

Automatic Detection of Annotation Errors in Polish-Language Corpora

In this article we propose an extension to the variation ngram based method of detecting annotation errors. We also show an approach to nding anomalies in the morphosyntactic annotation layer by using association rule discovery. As no research has previously been done in the eld of morphosyntactic annotation error correction for Polish, we provide novel results based on experiments on the large...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011